Active Learning Query Selection with Historical Information

نویسندگان

  • Michael Davy
  • Aisling Linehan
چکیده

This work describes novel methods and techniques to decrease the cost of employing active learning in text categorisation problems. The cost of performing active learning is a combination of labelling effort and computational overhead. Reducing the cost of active learning allows for accurate classifiers to be constructed inexpensively, increasing the number of realworld problems where machine learning solutions can be successfully applied. In this thesis we investigate strategies and techniques to reduce both computational expense and labelling effort in active learning. Critical to the success of active learning is the query selection strategy, which is responsible for identifying informative unlabelled examples. Selecting only the most informative examples will reduce labelling effort as redundant and uninformative examples are ignored. The majority of query selection strategies select queries based on the labelling predictions of the current classifier. This thesis suggests that information from prior iterations of active learning can help select more informative queries in the current iteration. We propose History-based query selection strategies, which incorporate predictions from prior iterations of active learning into the selection of the current query. These strategies have been shown to increase the accuracy of classifiers produced using active learning, thereby reducing labelling effort. In addition, History-based query selection strategies are very efficient since information is reused from previous iterations of active learning. Another contributing factor to the cost of active learning is computational expense. Query selection strategies can require considerable computation to identify the most informative examples. We investigate pre-filtering optimisation for the computationally inefficient error reduction sampling (ERS) query selection strategy. Pre-filtering restricts the number of unlabelled examples considered to a small subset of the pool, constructed using query selection strategy. Optimising ERS using pre-filtering was found to simultaneously reduce computational overhead and the labelling effort.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Selection via Weighted Entropy in Graph-Based Semi-supervised Classification

There has recently been a large effort in using unlabeled data in conjunction with labeled data in machine learning. Semi-supervised learning and active learning are two well-known techniques that exploit the unlabeled data in the learning process. In this work, the active learning is used to query a label for an unlabeled data on top of a semisupervised classifier. This work focuses on the que...

متن کامل

Active Learning-Based Elicitation for Semi-Supervised Word Alignment

Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, marginand query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain a...

متن کامل

Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection

This paper investigates and evaluates support vector machine active learning algorithms for use with imbalanced datasets, which commonly arise in many applications such as information extraction applications. Algorithms based on closestto-hyperplane selection and query-by-committee selection are combined with methods for addressing imbalance such as positive amplification based on prevalence st...

متن کامل

Optimistic Active-Learning Using Mutual Information

An “active learning system” will sequentially decide which unlabeled instance to label, with the goal of efficiently gathering the information necessary to produce a good classifier. Some such systems greedily select the next instance based only on properties of that instance and the few currently labeled points — e.g., selecting the one closest to the current classification boundary. Unfortuna...

متن کامل

Multi-Label Active Learning: Query Type Matters

Active learning reduces the labeling cost by selectively querying the most valuable information from the annotator. It is essentially important for multilabel learning, where the labeling cost is rather high because each object may be associated with multiple labels. Existing multi-label active learning (MLAL) research mainly focuses on the task of selecting instances to be queried. In this pap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009